智能论文笔记

Donut: Document Understanding Transformer without OCR

Geewook Kim , Teakgyu Hong , Moonbin Yim , Jinyoung Park , Jinyeong Yim , Wonseok Hwang , Sangdoo Yun , Dongyoon Han , Seunghyun Park

分类：机器学习 | 人工智能

2021-11-30

了解文档图像（例如，发票）是一个重要的研究主题，并在文档处理自动化中具有许多应用。通过基于深度学习的光学字符识别（OCR）的最新进展，目前的视觉文档了解（VDU）系统已经基于OCR设计。虽然这种基于OCR的方法承诺合理的性能，但它们遭受了由OCR引起的关键问题，例如（1）（1）昂贵的计算成本和（2）由于OCR误差传播而导致的性能下降。在本文中，我们提出了一种新颖的VDU模型，即结束可训练而不支撑OCR框架。为此，我们提出了一个新的任务和合成文档图像生成器，以预先列车，以减轻大规模实体文档图像上的依赖关系。我们的方法在公共基准数据集和私营工业服务数据集中了解各种文档的最先进的性能。通过广泛的实验和分析，我们展示了拟议模型的有效性，特别是考虑到真实世界的应用。

translated by 谷歌翻译

Holonomic Control of Arbitrary Configurations of Docked Modboats

Zhijie Qiao , Gedaliah Knizhnik , Mark Yim

分类：机器人

2022-11-29

The Modboat is a low-cost, underactuated, modular robot capable of surface swimming, docking to other modules, and undocking from them using only a single motor and two passive flippers. Undocking is achieved by causing intentional self-collision between the tails of neighboring modules in certain configurations; this becomes a challenge, however, when collective swimming as one connected component is desirable. Prior work has developed controllers that turn arbitrary configurations of docked Modboats into steerable vehicles, but they cannot counteract lateral forces and disturbances. In this work we present a centralized control strategy to create holonomic vehicles out of arbitrary configurations of docked Modboats using an iterative potential-field based search. We experimentally demonstrate that our controller performs well and can control surge and sway velocities and yaw angle simultaneously.

translated by 谷歌翻译

Technical Report on Web-based Visual Corpus Construction for Visual Document Understanding

Donghyun Kim , Teakgyu Hong , Moonbin Yim , Yoonsik Kim , Geewook Kim

分类：计算机视觉 | 人工智能 | 机器学习

2022-11-07

We present a dataset generator engine named Web-based Visual Corpus Builder (Webvicob). Webvicob can readily construct a large-scale visual corpus (i.e., images with text annotations) from a raw Wikipedia HTML dump. In this report, we validate that Webvicob-generated data can cover a wide range of context and knowledge and helps practitioners to build a powerful Visual Document Understanding (VDU) backbone. The proposed engine is publicly available at https://github.com/clovaai/webvicob.

translated by 谷歌翻译

Collective Control for Arbitrary Configurations of Docked Modboats

Gedaliah Knizhnik , Mark Yim

分类：机器人

2022-09-08

MODBOAT是一种低成本，不足的模块化机器人，能够进行表面游泳，停靠到其他模块，并仅使用一个电动机和两个被动式拖鞋从中脱落。通过在某些配置中引起相邻模块的尾巴之间的故意自我碰撞来实现撤消；但是，当集体游泳作为一个连接的组件是理想的时，这将成为一个挑战。在这项工作中，我们制定了一种集中式控制策略，以允许\ textit {任意}配置Modboats作为单个可通道的车辆游泳，并保证不会意外撤离。我们还提出了一个简化的模型，用于在实时控制的配置中以船只之间的流体动力相互作用。我们在实验上证明，我们的控制器的性能很好，对于各种尺寸和形状的配置都是一致的，并且可以同时控制潮流速度和偏航角。游泳时保持可控性，但是纯偏航控制会导致侧向运动，而横向运动不能被提出的框架抵消。

translated by 谷歌翻译

Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base

Jinyeong Chae , Jihie Kim

分类：计算机视觉 | 人工智能 | 自然语言处理 | 机器学习

2022-07-27

基于知识的视觉问题答案（KVQA）任务旨在回答需要其他外部知识以及对图像和问题的理解的问题。关于KVQA的最新研究以多模式形式注入外部知识，并且随着更多的知识，可能会添加无关紧要的信息，并且可能会混淆问题的回答。为了正确使用知识，本研究提出了以下内容：1）我们介绍了根据标题不确定性和语义相似性计算出的新型语义不一致度量；2）我们建议一种基于语义不一致度量的新的外部知识同化方法，并将其应用于集成KVQA的明确知识和隐性知识；3）使用OK-VQA数据集评估所提出的方法并实现最新性能。

translated by 谷歌翻译

Ray-Space Motion Compensation for Lenslet Plenoptic Video Coding

Thuc Nguyen Huu , Vinh Van Duong , Jonghoon Yim , Byeungwoo Jeon

分类：计算机视觉

2022-07-01

包含丰富信息的元素图像和视频需要大量的数据存储和高传输成本。虽然对元素图像编码进行了很多研究，但对元素视频编码的研究非常有限。我们通过查看射线空间域中的问题而不是在常规像素域中的问题来研究元素视频编码的运动补偿。在这里，我们在射线空间运动的两个子轴上，即整数射线空间运动和分数射线空间运动，为Lenslet视频开发了一种新颖的运动补偿方案。拟议的新方案设计了光场运动补偿预测，使其可以轻松地集成到众所周知的视频编码技术中，例如HEVC。与现有方法相比，实验结果显示出显着的压缩效率，平均增益为19.63％，峰值增长率为29.1％。

translated by 谷歌翻译

Confidence Score for Source-Free Unsupervised Domain Adaptation

Jonghyun Lee , Dahuin Jung , Junho Yim , Sungroh Yoon

分类：计算机视觉 | 机器学习

2022-06-14

无源的无监督域适应性（SFUDA）旨在使用预训练的源模型而不是源数据来获得未标记的目标域中的高性能。现有的SFUDA方法为所有目标样本分配了相同的重要性，这很容易受到错误的伪标记。为了区分样本重要性，在这项研究中，我们提出了一个新的样本置信度评分，即SFUDA的联合模型数据结构（JMDS）得分。与仅使用源或目标域知识之一的现有置信分数不同，JMDS分数都使用了两种知识。然后，我们建议使用SFUDA的JMDS（COWA-JMDS）框架进行置信度评分适应。 COWA-JMD由JMDS分数作为样品重量和权重混合，这是我们提出的混合变体。重量混合促进该模型更多地利用目标域知识。实验结果表明，JMDS得分的表现优于现有的置信得分。此外，Cowa-JMDS在各种SFUDA方案：封闭，开放和部分集合方案中实现最先进的表现。

translated by 谷歌翻译

Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem

Brian L. Trippe , Jason Yim , Doug Tischer , Tamara Broderick , David Baker , Regina Barzilay , Tommi Jaakkola

分类：机器学习 | (统计)机器学习

2022-06-08

支架结构的构建支持所需的基序，赋予蛋白质功能，显示出对疫苗和酶设计的希望。但是，解决这个主题交易问题的一般解决方案仍然开放。当前的脚手架设计的机器学习技术要么仅限于不切实际的小脚手架（长达20个长度），要么难以生产多种不同的脚手架。我们建议通过E（3） - 等级图神经网络学习各种蛋白质主链结构的分布。我们开发SMCDIFF以有效地从给定主题的条件下从该分布中采样脚手架；我们的算法是从理论上确保从扩散模型中的有条件样品，以大规模计算限制。我们通过与Alphafold2预测的结构保持一致的方式来评估我们设计的骨干。我们表明我们的方法可以（1）最多80个残基的样品支架，以及（2）实现固定基序的结构多样的支架。

translated by 谷歌翻译

Translating Hanja Historical Documents to Contemporary Korean and English

Juhee Son , Jiho Jin , Haneul Yoo , JinYeong Bak , Kyunghyun Cho , Alice Oh

分类：自然语言处理 | 人工智能 | 机器学习

2022-05-20

The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document written in Hanja, from both a full dataset of outdated Korean translation and a small dataset of more recently translated contemporary Korean and English. We compare our method against two baselines: a recent model that simultaneously learns to restore and translate Hanja historical document and a Transformer based model trained only on newly translated corpora. The experiments reveal that our method significantly outperforms the baselines in terms of BLEU scores for both contemporary Korean and English translations. We further conduct extensive human evaluation which shows that our translation is preferred over the original expert translations by both experts and non-expert Korean speakers.

translated by 谷歌翻译

Amplitude Control for Parallel Lattices of Docked Modboats

Gedaliah Knizhnik , Mark Yim

分类：机器人

2022-03-01

Modboat是一种低成本，不足的模块化机器人，能够表面游泳。它能够单独游泳，停靠其他Modboats，并仅使用单个电动机和两个被动拖鞋从它们中取消锁定。通过在相邻模块的尾巴之间引起故意自我碰撞，可以实现无额外动理的撤消；当团队游泳是一个连接的组件时，这将成为一个挑战。在这项工作中，我们制定了一种控制策略，以使平行的Modboats的平行格子作为一个单元游泳，该单元通常需要自动模块。我们表明，保证控制策略可以避免无意中的脱节，并最大程度地减少晶格内的内力。实验验证表明，控制器的性能很好，并且对于各种尺寸的晶格是一致的。游泳时保持可控性，但是纯偏航控制会导致侧向运动，而横向运动不能被提出的框架抵消。

translated by 谷歌翻译